-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vdk-core: Greenplum support #361
Conversation
Capability to connect to Greenplum database, and execute queries. A vdk-greenplum plugin introduced, that starts a Greenplum instance in docker, then runs a sample data job to persist data. Tests verify the data has been populated, using the newly introduced vdk-core CLI command "greenplum-query". README.md added, describing on how to use the plugin, enlist greenplum-related vdk-core configurations, and test locally. Testing Done: test_vdk_greenplum_utils added Signed-off-by: ikoleva <[email protected]>
Please let us try to post a bit smaller PRs in the spirit of https://github.com/vmware/versatile-data-kit/wiki/How-to-prepare-a-new-PR#break-your-code-commits-into--small-self-contained-units . This is almost too long (especially for python code, for java it's probably just fine :) ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
What comes to mind is https://mypy.readthedocs.io/en/stable/stubgen.html, so a basic vdk db plugin structure is generated based on an already existing one. Did try it, for me it was easier adapting the vdk-postgres plugin in this particular case |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great submission overall, all my comments are minor.
|
||
# Configuration | ||
|
||
Run vdk config-help - search for those prefixed with "GREENPLUM_" to see what configuration options are available. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes me think whether we should extend config-help to allow config variable sorting. If one has 15 or 20 plugins installed, it might be very annoying to have to scroll through them to find the one you're looking for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is obviously unrelated to this PR.
name="vdk-greenplum", | ||
version=__version__, | ||
url="https://github.com/vmware/versatile-data-kit", | ||
description="Versatile Data Kit SDK plugin provides support for Greenplum database and greenplum transformation templates.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
description="Versatile Data Kit SDK plugin provides support for Greenplum database and greenplum transformation templates.", | |
description="Versatile Data Kit SDK plugin provides support for connecting to a Greenplum database and Greenplum transformation templates.", |
description="Versatile Data Kit SDK plugin provides support for Greenplum database and greenplum transformation templates.", | ||
long_description=pathlib.Path("README.md").read_text(), | ||
long_description_content_type="text/markdown", | ||
install_requires=["vdk-core", "psycopg2-binary"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't tabulate also be included here?
root_command.add_command(greenplum_query) | ||
|
||
|
||
@click.command(name="greenplum-query", help="executes SQL query against Greenplum") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@click.command(name="greenplum-query", help="executes SQL query against Greenplum") | |
@click.command(name="greenplum-query", help="Executes SQL query against Greenplum.") |
os.environ, | ||
{ | ||
VDK_DB_DEFAULT_TYPE: "GREENPLUM", | ||
VDK_GREENPLUM_DBNAME: "postgres", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems weird, unless it's some joke I'm not getting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greenplum is PosgtreSQL-based :) using same postgres
default schema
VDK_GREENPLUM_PORT: "5432", | ||
}, | ||
) | ||
class GreenplumUtilsTests(unittest.TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this class, and the file as well, referencing Greenplum Utils, when it seems to be testing the connection, and there is no greenplum_utils.py file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't quite catch that, could you elaborate?
Capability to connect to Greenplum database, and execute queries.
A vdk-greenplum plugin introduced, that starts a Greenplum instance in
docker, then runs a sample data job to persist data. Tests verify the
data has been populated, using the newly introduced vdk-core CLI
command "greenplum-query". README.md added, describing on how to use the
plugin, enlist greenplum-related vdk-core configurations, and test
locally.
Testing Done: test_vdk_greenplum_utils added
Signed-off-by: ikoleva [email protected]